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f-^ | Abstract 

^ , It was recently demonstrated in [5] that the non-linear bilateral filter 

^^ ■ II14II can be efficiently implemented using a constant- time or O(l) algo- 

00 ' rithm. At the heart of this algorithm was the idea of approximating the 

Gaussian range kernel of the bilateral filter using trigonometric functions. 

>In this letter, we explain how the idea in O can be extended to few other 
linear and non-linear filters ||14l 1171 [2j ■ While some of these filters have 
\^ . received a lot of attention in recent years, they are known to be computa- 

^ ' tionally intensive. To extend the idea in [5], we identify a central property 

^J , of trigonometric functions, called shiftability, that allows us to exploit the re- 

dundancy inherent in the filtering operations. In particular, using shiftable 
kernels, we show how certain complex filtering can be reduced to simply 
that of computing the moving sum of a stack of images. Each image in the 
stack is obtained through an elementary pointwise transform of the input 
,_^ | image. This has a two-fold advantage. First, we can use fast recursive al- 

\Q , gorithms for computing the moving sum 1 15 , 6 ], and, secondly, we can use 

■^- ' parallel computation to further speed up the computation. We also show 

how shiftable kernels can also be used to approximate the (non-shiftable) 
i— n . Gaussian kernel that is ubiquitously used in image filtering. 

Keywords: Filtering, shiftability, kernel, 0(1) complexity, approximation, 
constant-time algorithm, moving sum, neighborhood filter, spatial filter, bilat- 
eral filter, non-local means. 
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ctf 1 Introduction 



The function cos(x) has the remarkable property that, for any translation r, 
cos(a; — t) = cos(t) cos(x)-|-sin(T) sin(x). That is, we can express the translate of 
a sinusoid as the linear combination of two fixed sinusoids. More generally, this 
holds true for any function of the form ip(x) — c\ exp(aia;) + • • ■ + c^ exp(aNx). 
This follows from the addition-multiplication property of the exponential. As 
special cases, we have the pure exponentials when a ; is real, and the trigono- 
metric functions when on is imaginary. 

The translation of not all functions can be written in this way, e.g., consider 
the functions exp(— x 2 ) and (1 + x 2 )^ 1 . The other class of functions that have 
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this property are the polynomials, ip(x) = c + C\x H \- c^x N . Functions with 

this property can be realized in higher dimensions using higher-dimensional 
polynomials and exponentials, or simply by taking the tensor product of the 
one-dimensional functions. 

More generally, we say that a function (p(x) is shiftable in R d if there ex- 
ists a fixed (finite) collection of functions tpi (x), . . . , <pn(x) such that, for every 
translation r in R d , we can write 

(p(x - t) = ci(r)(pi(x) H h c n (t)lp n (x). (1) 

We call the fixed functions <pi(x), . . . , ipn(x) the basis functions, ci(r), . . . , cat(t) 
the interpolating coefficients, and TV the order of shiftability Note that the coeffi- 
cients depend on r, and are responsible for capturing the translation action. 

Shiftable functions, and more generally, steerable functions flTl , have a 
long history in signal and image processing. Over the last few decades, re- 
searchers have found applications of these special functions in various domains 
such as image analysis QO and motion estimation [1], and pattern recogni- 
tion H16H . to name a few. In several of these applications, the role of steerability 
and its formal connection with the theory of Lie groups was only recognized 
much later. We refer the readers to the exposition of Teo [13] for an excellent 
account on the theory and practice of steerable functions. 

In a recent paper (S), we showed how specialized trigonometric kernels 
could be used for fast bilateral filtering. This work was inspired by the work 
of Porikli H10H . who had earlier shown how polynomials could be used for the 
same purpose. We now realize that it is the shiftability of the kernel that is at 
the heart of the matter, and that this can be applied to other forms of filtering 
and using more general kernels. We will provide a general theory of this in 
Section [21 where we also propose some algorithms that have a constant-time 
complexity per pixel, independent of the size of the filtering kernels. To the best 
of our knowledge, such algorithms have not been reported in the community 
in its full generality. The problem of designing shiftable kernels is addressed 
in Section [3j Here we also propose a scheme for approximating the ubiquitous 
Gaussian kernel using shiftable functions. Finally, in Section|4j we present some 
thoughts on how shiftability could be used for reducing the complexity of the 
non-local means filter fl2]| . 



2 Filtering using moving sums 

We now show how certain constant-time algorithms for image filtering can be 
obtained using shiftable kernels. The idea is that by using shiftability, we can 
decompose the local kernels (obtained using translations) in terms of "global" 
functions - the basis functions. The local information gets encoded in the inter- 
polating coefficients. This allows us to explicitly take advantage of the redun- 
dancy in the filtering operation. 



To begin with, we consider the simplest neighborhood filter, namely the 
spatial filter. This is given by 

J(x) = - [<p(y)f(x - y) dy (2) 

where 77 = J n f(y) dy. Here ip(x) is the kernel, and fi is the neighborhood over 
which the integral (sum) is taken. Note that, henceforth, we will use the term 
kernel to specifically mean that the function is symmetric, non-negative, and 
unimodal over its support (peak at the origin). It is not immediately clear that 
one can construct such kernels using shiftable functions. We will address this 
problem in the sequel. 

For the moment, suppose that the kernel <p(x) is shiftable, so that ([T]) holds. 
We use this, along with symmetry, to center the kernel at a;. In particular, we 
write <p(y) = <p(x - y - x) = J2 n =i Cn(x)ip n (x - y). Using linearity, we have 

/ f(y)f(x -y) dy = Y] c n (x) / ip n (x - y)f(x - y) dy. 
Jn n=1 Jn 

Similarly, n = J2n = i c n( x ) In V*( x ~ V) d V- 

Now consider the case when Q is a square, 51 = [— T, T} 2 . Then the above 
integrals are of the form 

/ F(x - y) dy. 

J[-T,T] 2 

This is easily recognized as the moving sum of F(-) evaluated at x. We will 
use the notation Sum(F, x, T) to denote this integral (this is simply the "moving- 
average", but without the normalization). As is well-known, this can be effi- 
ciently computed using recursion H151 [6]| . 

The main idea here is that, by using shiftable kernels, we can express (O 
using moving sums and these in turn can be computed efficiently. In particular, 
note that the number of computations required for the moving sum is indepen- 
dent of T, that is, the size of the neighborhood Q. These are referred to as the 
constant-time or 0(1) algorithms in the image processing community. The main 
steps of our algorithm are summarized in Algorithm [TJ 

Algorithm 1 Constant-time spatial filtering 
Input: f{x), ip(x) as in ([l]), and T. 

1. For 1 < n < N, use recursion to compute Sum(fip„,x,T) and 
Svm((p n ,x,T). 

2. Set f(x) as the ratio of ^ n=1 Cn{x)Siw(f(p n ,x, T) and 
En=l c n (x)Sum(ip n , x, TJ. 

Return: Filtered image f(x). 

The above idea can also be extended to non-linear filters such as the edge- 
preserving bilateral filter Ifl4l [TTl \l2li . The bilateral filtering of an image f(x) 



is given by the formula 

f(x) = -JL U{y) cj>(f(x - y) - f{x)) f{x - y) dy (3) 

VK X ) Ju 

where r)(x) = J Q <p(y) <t>(f(x - y) - fix)) dy. 

In this formula, the bivariate function ip(x) is called the spatial kernel, and 
the one-dimensional function 4>{t) is called the range kernel. Suppose that both 
these kernels are shiftable. In particular, let ip(x — r) = Yl m =i c m(T)(/? m (x), 

Plugging these into ([3]) and using linearity, we can write the numerator as 

Vc m (a;)d n (/(a;))/ <p m (y)4> n [f{x - y))f(x - y) dy. 
trz Jo. 



Similarly, 



Vi x ) = ^2 C ™( X ) d nifi X )) j Vmi x - V)4>nifi x - V)) dy . 



Algorithm 2 Constant-time bilateral filtering 
Input: fix), <p(x) and 0(s) as in (O, and T. 

1. For 1 < m < M and 1 < n < N, set a m ,ni x ) = c m (a;)(i„(/(a;)), i g mi „(a;) = 

ifrni x )(j>nifi x ))fi x ), and hm, n (x) = <p m (x)<f> n (f(x)). 

2. Use recursion to compute Sum(g m . n , x, T) and Sum(/i TOj „, cc, T). 

3. Set fix) as the ratio of J2m,n a ™,ni x ) Sum (9m,n,x,T) and 
E™,„ a m , n (x)Sum(/i m] „, cc, T). 

Return: Filtered image /(a;). 

As before, we again recognize the moving sums when Q = [— T, T] 2 . This 
gives us a new constant-time algorithm for bilateral filtering, which is summa- 
rized in Algorithm |2j We note that this algorithm is an extension of the one in 
(H, where we used shiftable kernels only for the range filter. 

The main computational advantage of Algorithms Q] and |2] is that the num- 
ber of computations per pixel does not depend on the size of the spatial or 
the range kernel. It only depends on the order of shiftability More precisely, 
the computation time scales linearly with the number of basis functions. For 
example, in case of the bilateral filter, the computation size is roughly MN 
times the computations required to perform a single moving sum of the image, 
plus the overhead time of initializing the basis images and recombining the out- 
puts. To get an estimate of the running time, we implemented Algorithm [2] in 
MATLAB on an Intel machine with a quad-core 2.83 GHz processor. We found 
that a single recursive implementation of the moving sum requires roughly 10 
milliseconds on a 512 x 512 image when T = 5. Considering an instance where 
M x JV = 3 2 x 5, the total computation time was roughly 2.5 seconds, 1 seconds 



to initialize the basis images and combine the results, and 1.5 seconds for the 
moving sums. This indeed looks promising since we can definitely bring down 
the time using a JAVA or C compiler. Moreover, we can also use multithreading 
(parallel computation) to further accelerate the implementation. 

3 Shiftable kernels 

We now address the problem of designing kernels that are shiftable. The theory 
of Lie groups can be used to study the class of shiftable functions, e.g., see dis- 
cussions in H131 [8j . It is well-known that the polynomials and the exponentials 
are essentially the only shiftable functions. 

Theorem 3.1 (The class of shiftable functions, IfTBl ). The only smooth functions 
that are shiftable are the polynomials and the exponentials, and their sum and 
product. 

We recall that a kernel is a smooth function that is symmetric, non-negative, 
and unimodal. A priori, it is not at all obvious that there exists a kernel that is 
a polynomial or exponential. Indeed, the real exponentials cannot form valid 
kernels since they are neither symmetric nor unimodal. On the other hand, 
while there are plenty of polynomials and trigonometric functions that are both 
symmetric and non-negative, it is impossible to find a one that is unimodal over 
the entire real line. This is simply because the polynomials blow up at infinity, 
while the trigonometric functions are oscillatory. 

Proposition 3.2 (Conflicting properties). There is no kernel that is shiftable on 
the entire real line. 

Nevertheless, unimodality can be achieved at least on a bounded interval, 
if not the entire real line, using polynomial and trigonometric functions. Note 
that, in practice, a priori bounds on the data are almost always available. For 
the rest of the paper, and without loss of generality, we fix the bounded interval 
to be [-T,T]. 

We now give two concrete instances of shiftable kernels on [—T,T]. The 
reason for these particular choices will be clear in the sequel. For every integer 
N = 0, 1, 2, . . ., consider the functions 



t 2 \" 
p N (t) = ( I - — 1 and q N {t) 



(<,S, 2tJ 



where t takes values in [— T, T]. It is easily verified that these are valid kernels 
on [— T, T]. The crucial difference between the above kernels is that the order of 
shiftability oip N (t) is 2N + 1, while that of gjv(*) is much lower, namely N + 1. 
We note that it is the kernel q^ (t) that was used in ■ 

The fact that the sum and product of shiftable kernels is also shiftable can 
be used to construct kernels in higher dimensions. For example, we can set 
<p(xi : x 2 ) = Pn(%i)pn(%2), or (p(xi : x 2 ) = gjvOci^Jv^). We call these the 



separable kernels. In higher dimensions, one often requires the kernel to have 
the added property of isotropy. In two dimensions, we can achieve near-isotropy 
using these separable kernels provided that N is sufficiently large. We will give 
a precise reason later in the section. However, as shown in a different context 
in [3], it is worth noting that kernels other than the standard separable ones 
(of same order) can provide better isotropy, particularly for low orders. Indeed, 
consider the following kernel on the square [— T, T] 2 : 

(f>(x 1 ,x 2 ) = qi(xi)qi f 1 _ 2 j qi{x 2 )qi f X r- 2 J . (4) 

The kernel is composed of four cosines distributed uniformly over the circle. It 
can be verified that <f>{xi , x 2 ) is more isotropic than the separable kernel of same 
order, q 2 (x i)q 2 (x 2 ). However, note that the non-negativity constraint is violated 
on the corners of the square [— T, T} 2 in this case. This is unavoidable since we 
are trying to approximate a circle within a square. Nevertheless, this does not 
create much of a problem since the negative overshoot is well within 2% of the 
peak value. Following the same argument, the polynomial 

(t>(Xl,X 2 ) =Pl(Xl)pi ( X fK 2 j Pi ( X 2)Pl f X r- 2 

tends to be more isotropic on [—T,T] 2 than p 2 {x \)p 2 {x 2 ). 

3.1 Approximation of Gaussian kernels 

The most commonly used kernel in image processing is the Gaussian kernel. 
This kernel, however, is not shiftable. As a result, we cannot directly apply 
our algorithm for the Gaussian kernel. One straightforward option is to instead 
approximate the Gaussian using its Fourier series or its Taylor polynomial, both 
of which are shiftable. The difficulty with either of these is that they do not yield 
valid kernels. That is, it is not guaranteed that the resulting approximation is 
non-negative or unimodal; see [5, Fig. 1]. This is exactly the problem with 
the polynomial approximation used for the bilateral filter in [llj]|. As against 
these, we can instead use the specialized shiftable kernels pnii) and qjv(0 to 
approximate the Gaussian to any arbitrarily precision, based on the following 
results. 

Theorem 3.3 (Gaussian approximation). For every —T <t<T, 

^oo^(^) =CXP (^)' (5) 



and 



In either case, the convergence takes place quite rapidly. The first of these 
results is well-known. For a proof of the second result, we refer the readers to 
(SJ Sec. II-D] . The added utility of these asymptotic formulas is that we can 
control the variance of these kernels using the variance of the target Gaussian. 
This is particularly useful because no simple closed-form expressions for the 
variance of pjv(i) an d <?jv(£) are available. We refer the authors to 10 Sec. II-E] 
for details on how one can exactly control the variance of qN(t). The same idea 
applies to pw(t). 

We now discuss how to approximate isotropic Gaussians in two dimensions 
using shiftable kernels. Doing this using separable kernels is straightforward. 
For example, 

*»«* [tw) qN [tw) = cxp { — &-) • (7) 

There is yet another form of convergence which is worth mentioning. Consider 
the following kernel defined on [— T, T] 2 : 



N I-7T 

r(xi,x 2 ) = n^ifv/ — (xiCOs6» fc + s 2 sin4) 



N 
>N{Xl,X 2 ) = 

fc=l 

where 9k — (k — l)ir/N. The kernel (/>4,(xi, x%) is simply the (rescaled) kernel 
4>(xi,X2) in ([4])- By slightly adapting the proof of Theorem 2.2 in [3, Appendix 
A], we can show the following. 

Theorem3.4 (Approximation of isotropic Gaussian) . For every {x 1^x2) in [—T,T] 2 , 

, , n ( K 2 (x\ + xiy 

hm 4> N {x u x 2 ) = exp — - - 

iv — >oo y »J a 

Note that the target Gaussian in this case is the same as in (0). The key 
difference, however, is that for low orders N, e.g. N — 4, 4>n{x\,X2) looks 
more isotropic than qN{xi/\/~N)qN{x2/VN)- However, as noted earlier, the 
non-negativity requirement of a kernel is mildly violated by <J>n{x\,X2) at the 
corners of the square domain. 

The significance of the order N is that it allows one to arbitrarily control the 
accuracy of the Gaussian approximation. As discussed in [5, Sec. II-E], N has to 
be greater than a threshold A^ for the approximating kernels in (0) and ([6]) to 
be non-negative and unimodal. In particular, if a 2 is the variance of the target 
Gaussian, then Ao is of the order 0(T 2 /a 2 ). Thus, a large Ao is required to 
approximate a narrow Gaussian on a large interval. For the spatial kernel, the 
ratio T 2 /a 2 is usually small, since T is small in this case. This is, however, not 
the the case for the range kernel of the bilateral filter, where T can almost be 
as large as the dynamic range of the image. The good news is that, by allowing 
for mild (and controlled) violations of the non-negativity constraint, one can 
closely approximate the Gaussian using a significantly lower number of terms. 
This is done by discarding the less significant basis functions in ([T|); see Q 
for detailed discussion and results. The same idea can also be applied to the 
polynomials. 



4 Discussion 



We have shown how, using shiftable kernels, we can express two popular forms 
of image filters (and possibly many more) in terms of simple moving sums. 
We note that we can speed up the implementation of Algorithm Q] and [2] using 
parallel computation or multithreading. In future work, we plan to implement 
these algorithms in C or JAVA, and make extensive comparison of the result and 
execution time with the state-of-the-art algorithms. 

To conclude, we note that the main idea can also be extended to some other 
forms of neighborhood filtering, e.g., the ones in H171I12H . What is even more 
interesting is that we can extend the idea for approximating the non-local means 
filter El . The non-local means is a higher order generalization of the bilateral 
filter, where one works with patches instead of single pixels. The filtered image 
f(x) in this case is given by 



/(*) = 



J f(x-y)w(x,y) dy 
fw(x,y) dy 



(8) 



where 



w 



[x,y) =exp(-/i 2 / g{u)(f{x + u) - f (x - y + m)) 2 du) 



Here g(u) is a two-dimensional Gaussian, and the integrals (sums) are taken 
over the entire image domain. In practice, though, the sum is performed locally 

in. 

It is possible to express ([8]) in terms of moving sums, using shiftable approx- 
imates of the Gaussian. First, we approximate the domain of the outer integral 
by a sufficiently large square [— T, T] 2 , and the inner integral by a finite sum 
over p neighborhood pixels ui,..., u p , where, say, Mi = 0. In this case, f(x) is 
given by 



r)(x) 



■T,T] 2 



f(x - y)cp(f{x + mi) - f(x - y + mi) 



. . . , f(x + u p ) - f(x - y + u p ))dy (9) 

where (p(ti, ■ . ■ , t p ) = exp(— h~~ 2 J2k=i 3( M fc) i fe)' anc ^ where r](x) is given by 



-T,T] 2 



<p(f(x + Mi) - f(x-y + Mi), . . . 
, f(x + u p ) - f(x-y + u p ))dy. 



(10) 



Note that <p(ti, . . . , t p ) is an anisotropic Gaussian in p variables with covariance 
diag(/i 2 /2g(Mi), . . . , h 2 /2g(u p )). Now, using a shiftable approximation (we con- 
tinue using the same symbol) of tp(t\, . . . ,t p ) as in (Q]), we can write the inte- 
grand in ([10]) as ^ =1 c n (/(x + «i), . . . , f(x + u p ))ip„(f(x-y + Ui), ...,f(x- 
y + u p )). 



We can then write the numerator in © as ^2 n c n (f(x + ui),...,f(x + 
u p ))Sum(G n ,x,T), where we set G n {x) = f(x)ip n (f(x + ui),...,f(x + u p )). 
Similarly, letting H n {x) — tp n (f(x + Ui),...,f(x + u p )), we have r](x) — 

Y,n °n(f( x + u l)' ■ • • ' f( x + u p ))Sum(H n ,x, T). 

The catch here is that it get a good approximation of non-local means we 
need to make both T and p large. While there is no problem in making T large 
(the cost of the moving sum is independent of T), it is rather difficult to make p 
large. For example, with a separable approximation of tp(ti, . . . , t p ), the overall 
order N would scale as n p , where n is the approximation order of the Gaussian 
along each dimension. This limits the scheme to coarse approximations, and 
to small neighborhoods. To make this practical, we need a a polynomial or 
trigonometric approximation of ip(ti, ■ . ■ , t p ) whose order grows slowly with p. 
It would indeed be interesting to see if such approximations exist at all. 
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